Overfitting Bayesian Mixture Models with an Unknown Number of Components
نویسندگان
چکیده
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models.
منابع مشابه
Model Selection for Mixture Models Using Perfect Sample
We have considered a perfect sample method for model selection of finite mixture models with either known (fixed) or unknown number of components which can be applied in the most general setting with assumptions on the relation between the rival models and the true distribution. It is, both, one or neither to be well-specified or mis-specified, they may be nested or non-nested. We consider mixt...
متن کاملIncreasing the mixture components of non-uniform HMM structures based on a variational Bayesian approach
We propose using the Variational Bayesian (VB) approach for automatically creating non-uniform, context-dependent HMM topologies. Although the Maximum Likelihood (ML) criterion is generally used to create HMM topologies, it has an overfitting problem. Recently, to avoid this problem, the VB approach has been applied to create acoustic models for speech recognition. We introduce the VB approach ...
متن کاملOptimal Bayesian Classifier for Land Cover Classification Using Landsat TM Data
An optimal Bayesian classifier using mixture distribution class models with joint learning of loss and prior probability functions is proposed for automatic land cover classification. The probability distribution for each land cover class is more realistically modeled as a population of Gaussian mixture densities. A novel two-stage learning algorithm is proposed to learn the Gaussian mixture mo...
متن کاملSpeech modeling using variational Bayesian mixture of Gaussians
The topic of this paper is speech modeling using the Variational Bayesian Mixture of Gaussians algorithm proposed by Hagai Attias (2000). Several mixtures of Gaussians were trained for representing cepstrum vectors computed from the TIMIT database. The VB-MOG algorithm was compared to the standard EM algorithm. VB-MOG was clearly better, its convergence was faster, there was no tendency to over...
متن کاملModel-based clustering based on sparse finite Gaussian mixtures
In the framework of Bayesian model-based clustering based on a finite mixture of Gaussian distributions, we present a joint approach to estimate the number of mixture components and identify cluster-relevant variables simultaneously as well as to obtain an identified model. Our approach consists in specifying sparse hierarchical priors on the mixture weights and component means. In a deliberate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 10 شماره
صفحات -
تاریخ انتشار 2015